# Low-resource deployment
Nvidia OpenReasoning Nemotron 32B GGUF
A quantized version of NVIDIA OpenReasoning - Nemotron - 32B, quantized through llama.cpp to reduce model storage and computational resource requirements for easy deployment.
Large Language Model
N
bartowski
2,382
1
Nvidia OpenReasoning Nemotron 1.5B GGUF
The quantized version of NVIDIA OpenReasoning - Nemotron - 1.5B, optimized by the llama.cpp tool to improve the running efficiency and performance on different hardware.
Large Language Model
N
bartowski
660
4
Openreasoning Nemotron 32B Q4 K M GGUF
This model is a GGUF format model converted from nvidia/OpenReasoning-Nemotron-32B and can be used with llama.cpp.
Large Language Model
Transformers Supports Multiple Languages

O
sm54
127
1
Thedrummer Cydonia 24B V4 GGUF
A quantized version of TheDrummer's Cydonia-24B-v4 model based on llama.cpp, which can run efficiently on devices with limited resources.
Large Language Model
T
bartowski
3,869
4
Voxtral Mini 3B 2507 Transformers
Apache-2.0
Voxtral Mini is an enhanced version based on Ministral 3B, with advanced audio input capabilities and excellent performance in speech transcription, translation, and audio understanding.
Audio-to-Text
Transformers Supports Multiple Languages

V
MohamedRashad
416
2
Kyutai Helium 1 2b GGUF
A GGUF format model file based on kyutai/helium-1-2b, quantized by TensorBlock, supporting multiple languages.
Large Language Model
Transformers Supports Multiple Languages

K
tensorblock
154
1
LFM2 1.2B MLX Bf16
Other
LFM2-1.2B is a multilingual text generation model with a parameter scale of 1.2B launched by LiquidAI, optimized for Apple Silicon chips.
Large Language Model
Transformers Supports Multiple Languages

L
lmstudio-community
192.07k
1
Wr30a Deep 7B 0711 I1 GGUF
Apache-2.0
This is a quantized version of the prithivMLmods/WR30a-Deep-7B-0711 model, supporting multiple languages and suitable for various tasks such as text generation and image caption generation.
Image-to-Text
Transformers Supports Multiple Languages

W
mradermacher
262
1
Huihui Gemma 3n E2B It Abliterated GGUF
A statically quantized version of the Gemma-3n-E2B-it model, supporting various speech and text processing tasks
Large Language Model
Transformers English

H
mradermacher
583
1
Diffucoder 7B Cpgrpo 8bit
DiffuCoder-7B-cpGRPO-8bit is a code generation model converted to MLX format, based on apple/DiffuCoder-7B-cpGRPO, and is specifically designed to provide developers with an efficient code generation tool.
Large Language Model Other
D
mlx-community
272
2
Unireason Qwen3 14B RL GGUF
Apache-2.0
A static quantization version of UniReason-Qwen3-14B-RL, suitable for text generation and mathematical reasoning research scenarios.
Large Language Model
Transformers English

U
mradermacher
272
1
Gemma 3n E2B GGUF
A static quantized version of the Google Gemma-3n-E2B model, offering various quantization types to balance model size and performance.
Large Language Model
Transformers English

G
mradermacher
207
0
Gemma 3n E4B It MLX Bf16
Gemma-3n-E4B-it is a model developed by Google, optimized through MLX quantization, and is particularly suitable for Apple Silicon devices.
Large Language Model
Transformers

G
lmstudio-community
130.21k
3
Delta Vector Austral 70B Winton GGUF
Apache-2.0
This is a quantized version of the Austral-70B-Winton model by Delta-Vector. Through quantization technology, it reduces the storage and computational resource requirements of the model while maintaining good performance, making it suitable for scenarios with limited resources.
Large Language Model English
D
bartowski
791
1
Gama 12b I1 GGUF
A quantized version of Gama-12B, providing files of various quantization types, suitable for text generation tasks and supporting English and Portuguese.
Large Language Model
Transformers Supports Multiple Languages

G
mradermacher
559
1
Gama 12b GGUF
Gama-12B is a large language model supporting multiple languages, offering various quantized versions to meet different performance and precision requirements.
Large Language Model
Transformers Supports Multiple Languages

G
mradermacher
185
1
Longwriter Zero 32B I1 GGUF
Apache-2.0
The LongWriter-Zero-32B quantized model is based on the THU-KEG/LongWriter-Zero-32B base model, supports both Chinese and English, and is suitable for long context scenarios such as reinforcement learning and writing.
Large Language Model
Transformers Supports Multiple Languages

L
mradermacher
135
1
Skywork Skywork SWE 32B GGUF
Apache-2.0
Skywork-SWE-32B is a large language model with 32B parameters. It is quantized by Llamacpp imatrix and can run efficiently in resource-constrained environments.
Large Language Model
S
bartowski
921
2
Nvidia AceReason Nemotron 1.1 7B GGUF
Other
This is a quantized version of the NVIDIA AceReason - Nemotron - 1.1 - 7B model, which optimizes the model's running efficiency on different hardware while maintaining certain performance and quality.
Large Language Model Supports Multiple Languages
N
bartowski
1,303
1
Openbuddy OpenBuddy R1 0528 Distill Qwen3 32B Preview0 QAT GGUF
Apache-2.0
This is the quantized version of OpenBuddy-R1-0528-Distill-Qwen3-32B-Preview0-QAT. With quantization technology, the model can run more efficiently under different hardware conditions.
Large Language Model Supports Multiple Languages
O
bartowski
720
1
Qwen3 Embedding 0.6B Onnx Uint8
Apache-2.0
This is a quantized model based on ONNX, which is the uint8 quantized version of Qwen/Qwen3-Embedding-0.6B. It reduces the model size while maintaining retrieval performance.
Text Embedding
Q
electroglyph
112
8
Wan2.1 T2V 14B FusionX VACE GGUF
Apache-2.0
This is a text-to-video quantization model that undergoes quantization conversion based on a specific base model and supports various video generation tasks.
Text-to-Video English
W
QuantStack
461
3
Wan2.1 T2V 14B FusionX GGUF
Apache-2.0
This is a quantized text-to-video model that converts the base model to the GGUF format and can be used in ComfyUI, providing more options for text-to-video generation.
Text-to-Video English
W
QuantStack
563
2
Deepseek R1 0528 Qwen3 8B 6bit
MIT
A 6-bit quantized version converted from the DeepSeek-R1-0528-Qwen3-8B model, suitable for text generation tasks in the MLX framework.
Large Language Model
D
mlx-community
582
1
Blitzar Coder 4B F.1 GGUF
Apache-2.0
Blitzar-Coder-4B-F.1 is an efficient multilingual coding model fine-tuned based on Qwen3-4B, supporting more than 10 programming languages and having excellent code generation, debugging, and reasoning capabilities.
Large Language Model
Transformers

B
prithivMLmods
267
1
Home Llama 3.2 3B
Other
Home Llama 3.2 3B is fine-tuned based on Meta's Llama 3.2 3B model and is specifically designed for controlling home devices and performing basic Q&A tasks.
Large Language Model
Safetensors Supports Multiple Languages
H
acon96
405
1
Echelon AI Med Qwen2 7B GGUF
This project provides the GGUF quantized file for the Echelon-AI/Med-Qwen2-7B model, supported by Featherless AI, aiming to enhance model performance and reduce operating costs.
Large Language Model
E
featherless-ai-quants
183
1
Gemma 3n E4B It
Gemma 3n is a lightweight and state-of-the-art open-source multimodal model family launched by Google. It is built on the same research and technology as the Gemini model and supports text, audio, and visual inputs.
Image-to-Text
Transformers

G
google
1,690
81
Bielik 11B V2.6 Instruct GGUF
Apache-2.0
Bielik-11B-v2.6-Instruct is a large Polish language model developed by SpeakLeash and ACK Cyfronet AGH, fine-tuned based on Bielik-11B-v2, suitable for instruction following tasks.
Large Language Model
Transformers

B
speakleash
206
5
Phi 3.5 Mini Instruct
MIT
Phi-3.5-mini-instruct is a lightweight and advanced open-source model built on the dataset used by Phi-3, focusing on high-quality, inference-rich data. It supports a 128K token context length and has powerful multilingual and long-context processing capabilities.
Large Language Model
Transformers Other

P
Lexius
129
1
Deepseek R1 0528 GGUF
MIT
A quantized model based on DeepSeek-R1-0528, focusing on text generation tasks and providing a more efficient way of use.
Large Language Model
D
lmstudio-community
1,426
5
Infly Inf O1 Pi0 GGUF
A quantized version based on the infly/inf-o1-pi0 model, supporting multilingual text generation tasks, optimized with llama.cpp's imatrix quantization.
Large Language Model Supports Multiple Languages
I
bartowski
301
1
Medgemma 4b It GGUF
Other
medgemma-4b-it is a multimodal model focused on the medical field, capable of processing image and text inputs, and suitable for multiple medical scenarios such as radiology and clinical reasoning.
Text-to-Image
Transformers

M
second-state
564
1
Devstral Small 2505 4bit DWQ
Apache-2.0
This is a 4-bit quantized language model in MLX format, suitable for text generation tasks.
Large Language Model Supports Multiple Languages
D
mlx-community
238
3
Facebook KernelLLM GGUF
Other
KernelLLM is a large language model developed by Facebook. This version is quantized using the llama.cpp tool with imatrix, offering multiple quantization options to suit different hardware requirements.
Large Language Model
F
bartowski
5,151
2
Verireason Qwen2.5 1.5B Grpo Small GGUF
This is the statically quantized version of the Nellyw888/VeriReason-Qwen2.5-1.5B-grpo-small model, focusing on Verilog code generation and reasoning tasks.
Large Language Model English
V
mradermacher
48
1
A M Team AM Thinking V1 GGUF
Apache-2.0
Llamacpp imatrix quantized version based on a-m-team/AM-Thinking-v1 model, supporting multiple quantization types, suitable for text generation tasks.
Large Language Model
A
bartowski
671
1
Qwen3 0.6B Llamafile
Apache-2.0
Qwen3 is the latest generation of large language models in the Qwen series, offering a dense model with 0.6B parameters, achieving breakthrough progress in reasoning, instruction following, agent capabilities, and multilingual support.
Large Language Model
Q
Mozilla
250
1
Thedrummer Rivermind Lux 12B V1 GGUF
This is a 12B-parameter large language model, processed with llama.cpp's imatrix quantization, offering multiple quantized versions to accommodate different hardware requirements.
Large Language Model
T
bartowski
1,353
1
Gemma 3 4b It 4bit DWQ
A 4-bit DWQ quantized MLX format version converted from the Google Gemma-3-4b-it model, providing efficient text generation capabilities
Large Language Model
G
mlx-community
2,025
1
- 1
- 2
- 3
- 4
- 5
- 6
Featured Recommended AI Models